aact_studies.tsvaact_drugs.tsvaact_drugs_leadmine.tsvaact_drugs_smi_pubchem_cid.tsvaact_drugs_smi_pubchem_cid2inchi.tsvaact_drugs_inchi2chembl.tsvaact_drugs_chembl_activity_pchembl.tsvaact_drugs_chembl_target_component.tsvpharos_targets.tsv
nct_idis the study ID.
## [1] "Thu Mar 28 17:07:27 2019"
library(readr)
library(data.table)
library(plotly, quietly=T)
Read file of all studies in AACT.
## [1] "Total studies: 300214 ; unique NCT_IDs: 300214"
Read file of all drugs in AACT.
id is AACT ID.## [1] "Unique drug names: 91347 ; unique intervention IDs: 255077"
Select only Interventional studies (study_type) associated with drugs (via nct_id).
## [1] "Interventional studies: 237892 (79.2%)"
## [1] "Interventional drug studies: 124421 ; unique NCT_IDs: 124421"
| phase | N_studies | N_drugs |
|---|---|---|
| Early Phase 1 | 1574 | 2615 |
| Phase 1 | 23603 | 48593 |
| Phase 1/Phase 2 | 6663 | 13288 |
| Phase 2 | 33910 | 68850 |
| Phase 2/Phase 3 | 3305 | 6503 |
| Phase 3 | 22988 | 49507 |
| Phase 4 | 19593 | 36331 |
| NA | 12785 | 29390 |
| overall_status | N |
|---|---|
| Completed | 145006 |
| Recruiting | 33973 |
| Terminated | 19618 |
| Unknown status | 18463 |
| Active, not recruiting | 13962 |
| Not yet recruiting | 8001 |
| NA | 7080 |
| Withdrawn | 6969 |
| Enrolling by invitation | 1060 |
| Suspended | 945 |
(To do: stack with study start_year.)
## Warning: Ignoring 1 observations
## Warning: Ignoring 1 observations
AACT drug names resolved to standard names and structures via SMILES. Now we can use cheminformatically rigorous counts for drugs as active pharmaceutical ingredients (APIs).
## [1] "Drug unique SMILES resolved by LeadMine: 4699 ; unique intervention IDs: 171741"
## [1] "Drugs (drug names) with resolved structure: 180555 / 197300 (91.5%)"
## [1] "Mentions by intervention ID: 157862 / 171741 (91.9%)"
## [1] "Mentions by study: 92966 / 99647 (93.3%)"
## [1] "Mentions by drug name: 11108 / 58297 (19.1%)"
## [1] "PubChem SMILES2CID hits: 3960 / 4698 (84.3%)"
## [1] "Intervention IDs mapped to PubChem CIDs (via SMILES): 153876"
## [1] "PubChem CIDs with InChIKeys: 3801"
## [1] "ChEMBL compounds mapped via InChIKeys: 3332"
Select only activities with pChembl values for confidence.
## [1] "ChEMBL activities: 124438"
## [1] "ChEMBL activities molecules: 2287 ; targets: 3832 ; documents: 16198"
## [1] "ChEMBL target proteins: 3157"
## [1] "ChEMBL target proteins mapped to TCRD (human): 1806"
## [1] "Organisms: 187"
| organism | N_targets |
|---|---|
| Homo sapiens | 1806 |
| Rattus norvegicus | 529 |
| Mus musculus | 238 |
| Bos taurus | 98 |
| Sus scrofa | 36 |
| Cavia porcellus | 26 |
| Escherichia coli K-12 | 19 |
| Oryctolagus cuniculus | 18 |
| Escherichia coli | 17 |
| Mycobacterium tuberculosis | 17 |
## [1] "Human targets: 1806"
| target_type | N |
|---|---|
| SINGLE PROTEIN | 1216 |
| PROTEIN COMPLEX | 247 |
| PROTEIN FAMILY | 210 |
| PROTEIN COMPLEX GROUP | 91 |
| PROTEIN-PROTEIN INTERACTION | 16 |
| SELECTIVITY GROUP | 14 |
| CHIMERIC PROTEIN | 12 |
## [1] "Human single-protein targets: 1216 ; unique UniProts: 1216"
## [1] " Tchem: 733" " Tclin: 341" " Tbio: 140"
## [4] " Tdark: 2"